Releasing Private Contingency Tables
نویسندگان
چکیده
Statistical agencies such as the US Census Bureau routinely release aggregate statistics about the general population. These statistics are often reported in the form of contingency tables. A 2-dimensional contingency table is an (m + 1) × (n + 1) matrix over two attributes that are binned into m rows and n columns. For instance, the attributes could be Age binned into buckets of length 10 and Height binned into buckets of length 5. For each cell (i, j) in the matrix, the table reports an aggregate value called the cell value. This is often an aggregate of some private attribute of the population. For example, the table may report the total number of individuals with diabetes in each cell of the table. In this case, the private attribute being aggregated is Boolean, either 1 if the individual has diabetes or 0 if he does not. The final row i = m + 1 (resp., column j = n + 1) contains the total row (resp., column) sums, e.g., the total number of individuals in each row who have diabetes. Besides the cell values, the total number of individuals that fall in each cell is also publicly known, e.g., the total number of individuals in the age range 10-20 years and height range 165-170 cm, regardless of whether or not they have diabetes.
منابع مشابه
Small Contingency Tables with Large Gaps
We construct examples of contingency tables on n binary random variables where the gap between the linear programming lower/upper bound and the true integer lower/upper bounds on cell entries is exponentially large. These examples provide evidence that linear programming may not be an effective heuristic for detecting disclosures when releasing margins of multi-way tables.
متن کاملDifferentially Private Publication of Sparse Data
The problem of privately releasing data is to provide a version of a dataset without revealing sensitive information about the individuals who contribute to the data. The model of differential privacy allows such private release while providing strong guarantees on the output. A basic mechanism achieves differential privacy by adding noise to the frequency counts in the contingency tables (or, ...
متن کاملPlain Answers to Several Questions about Association/Independence Structure in Complete/Incomplete Contingency Tables
In this paper, we develop some results based on Relational model (Klimova, et al. 2012) which permits a decomposition of logarithm of expected cell frequencies under a log-linear type model. These results imply plain answers to several questions in the context of analyzing of contingency tables. Moreover, determination of design matrix and hypothesis-induced matrix of the model will be discusse...
متن کاملAnalysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam
Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...
متن کاملPartial Association Components in Multi-way Contingency Tables and Their Statistiical Analysis
In analyses of contingency tables made up of categorical variables, the study of relationship between the variables is usually the major objective. So far, many association measures and association models have been used to measure the association structure present in the table. Although the association measures merely determine the degree of strength of association between the study varia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010